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Abstract 

Up to the moment there are two known algorithms of sector decomposition: an 
original private algorithm of Binoth and Heinrich and an algorithm made public 
last year by Bogner and Weinzierl. We present a new program performing the sector 
decomposition and integrating the expression afterwards. The program takes a set 
of propagators and a set of indices as input and returns the epsilon-expansion of 
the corresponding integral. 
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Classification: 4.4 Feynman diagrams, 4.12 Other Numerical Methods, 5 Computer 
Algebra, 6.5 Software including Parallel Algorithms 
External routines/libraries: QLink [1], Vegas [2] 

Nature of problem: The sector decomposition approach to evaluating Feynman in- 
tegrals falls apart into the sector decomposition itself, where one has to minimize 
the number of sectors; the pole resolution and epsilon expansion; and the numerical 
integration of the resulting expression 

Solution method: The sector decomposition is based on a new strategy. The sec- 
tor decomposition, pole resolution and epsilon-expansion are performed in Wolfram 
Mathematica 6.0 [3]. The data is stored on hard disk via a special program, QLink 
[1] . The expression for integration is passed to the C-part of the code, that parses 
the string and performs the integration by the Vegas algorithm [2] . This part of the 
evaluation is perfectly parallelized on multi- kernel computers. 
Restrictions: The complexity of the problem is mostly restricted by CPU time re- 
quired to perform the evaluation of the integral, however there is currently a limit 
of maximum 11 positive indices in the integral; this restriction is to be removed in 
future versions of the code. 

Running time: depends on the complexity of the problem 
References: 

[l] |http://qlink08.sourceforge.net[ open source 

[2] G. P. Lepage, the Cornell preprint CLNS-80/447,1980. 



[3] http://www.wolfram.com/products/mathematica/index.html 
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LONG WRITE-UP 



1 Introduction 

Sector decomposition in alpha (Feynman) parametric representations of Feyn- 
man integrals originated as a tool for analyzing the convergence and proving 
theorems on renormalization and asymptotic expansions of Feynman integrals 
PQ. The goal of this procedure is to decompose the initial integration domain 
into appropriate subdomains (sectors) and introduce, in each sector, new vari- 
ables in such a way that the integrand factorizes, i.e. becomes equal to a 
monomial in new variables times a non-singular function. General mathemati- 
cal results on Feynman integrals exist up to now only in the case where all the 
external momenta are Euclidean. However, in practice, one often deals with 
Feynman integrals on a mass shell or at threshold. 

Sector decomposition became a practical tool for evaluating Feynman integrals 
numerically with the growth of computer power. Practical sector decomposi- 
tion was introduced in [2] and was used to verify several analytical results for 
multiloop Feynman integrals, including three-loop [3"fHlo] results. For a good 
review on the sector decomposition we refer to Ref . [6] . 

The first public algorithm has been published by Bogner and Weinzierl [7]. 
It proposes four strategies for the sector decomposition; three of those are 
guaranteed to terminate. Strategy A [8] is conceptually the simplest, however 
it results in too many sectors; strategy B (Spivakovsky's strategy) has been 
described in [9], strategy C has been created by Bogner and Weinzierl and is 
an improvement of strategy B. Strategy X is likely to share the ideas of Binoth 
and Heinrich j2] It is not guaranteed to terminate but results in less sectors, 
than the other strategies. 

This paper presents a new algorithm FIESTA for evaluating Feynman integrals 
with the use of sector decomposition. We reproduced the strategies A, B and 
X in the code and presented a new strategy S, based on original ideas. Our 
strategy is also guaranteed to terminate, but results in more sectors than 
strategy X, however the difference is between S and X is not so significant 
as with B and X. We provide an example where the strategy X does not 
terminate. For details and the table comparing the various strategies please 
refer to Appendix [Al 

Our algorithm can successfully work with a single processor, however it is 
ready to work in a parallel environment, and the use of multi-kernel processors 
and multi-processor computers significantly speeds up the calculation. 
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To benchmark the power of FIESTA, we evaluated the triple box diagram [3] 
(which starts from e~ 6 poles) up to the finite part. We used the strategy 
X and took the symmetries of the diagram into account (see section [6] for 
the syntax). A Xeon double processor computer was used (8 cores Intel(R) 
Xeon(R) CPU E5472, 3.00GHz, 1600FSB, 2x6MB L2 cache, 32 GB RAM, 
4.6TB local hard drive). The evaluation took less than 3 days (62.4 hours); 
the job used maximally 1GB of RAM and 40GB on hard drive. All the answers 
are within the 1 percent error estimate from the existing analytical result. 

We tested FIESTA only on Windows and Linux platforms but there are no 
Linux-specifics; it should be possible to compile the sources on any Unix plat- 
form. However, we provide the pre-compiled binary files only for Windows on 
x86 and Linux on x86-64. 



2 Theoretical background 

FIESTA calculates Feynman integrals with the sector decomposition approach. 
It is based on the a-representation of Feynman integrals. After performing 
Dirac and Lorentz algebra one is left with a scalar dimensionally regularized 
Feynman integral [TO] 



where d = 4 — 2e is the space-time dimension, CLyi £1X6 indices, / is the number 
of loops and 1/E n are propagators. We work in Minkowski space where the 
standard propagators are the form l/(— p 2 + m? — iO). Other propagators are 
permitted, for example, l/(v • k ± iO) may appear in diagrams contributing to 
static quark potentials or in HQET where v is the quark velocity. Substituting 



changing the integration order, performing the integration over loop momenta, 
replacing a\ with Xii] and integrating over r\ one arrives at the following formula 
(see e.g. [UJ): 

F(ai, . . . ,a n ) = 




(1) 




(2) 



T(A - ld/2) 
n U T(a 3 ) 



xj>0 



1 



dxi . . . dx n 5 1 1 — 



i=l 




) 



jjA-(l+l)d/2 



pA-ld/2 



(3) 
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where A = Yh=x a n and U and F are constructively defined polynomials of Xj. 
The formula ([3]) has no sense if some of the indices are non-positive integers, 
so in case of those the integration is performed according to the rule 

I dX Yia) f{x) = f (0) 
o ^ ' 

where a is a non-positive integer. 

After performing the decomposition of the integration region into the so-called 
primary sectors [2] and making a variable replacement, one results in a linear 
combination of integrals of the following form: 

} I \ jjA-(l+l)d/2 

I dXi...dx n , I] 2 ? pA-ld/2 ( 4 ) 

x J= V =1 I 

If the functions uA F A-id/2 2 had no singularities in e, one would be able to per- 
form the expansion in e and perform the numerical integration afterwards. 
However, in general one has to resolve the singularities first, which is not pos- 
sible for general U and F. Thus, one starts a process the sector decomposition 
aiming to end with a sum of similar expressions, but with new functions U and 
F which have no singularities (all the singularities are now due to the part 

n^i^y )• Obviously it is a good idea to make the sector decomposition 
process constructive and to end with a minimally possible number of sectors. 
The way sector decomposition is performed is called a sector decomposition 
strategy and is an essential part of the algorithm. 

After performing the sector decomposition one can resolve the singularities by 
evaluating the first terms of the Taylor series: in those terms one integration 
is taken analytically, and the remainder has no singularities. Afterwards the 
^-expansion can be performed and finally one can do the numerical integration 
and return the result. 

Please keep in mind that this approach works only using numerical integration: 
the values for all invariants should be substituted at the very early stage, after 
generating the functions U and F. 



3 Overview of the software structure 

FIESTA is written in Wolfram Mathematica 6.0 and C. The user is not sup- 
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posed to use the C part directly as it is launched from Mathematica via the 
Mathlink protocol. The C part is performing the numerical integration with 
the use of the Vegas [T31[T4] algorithm implemented as a FORTRAN program. 
The Mathematica part can work independently, however the C- integration is 
much more powerful than the Mathematica built-in one. Hence to calculate 
complicated integrals one should use the C-integration. 



4 Description of the individual software components 

The Mathematica part is a .m file that is loaded into Mathematica simply by 



The Mathematica part of the program takes as input the set of loop momenta, 
the set of propagators, the set of indices and the required order of e-expansion 
to be evaluated. It performs the differentiation (in case of negative indices), 
the sector decomposition, the resolution of singularities, the £-expansion and 
prepares the expressions for the numerical integration. The numerical integra- 
tion can both be performed by Mathematica or by the C-part of FIESTA. The 
Mathematica integration has the advantage of being able to work with com- 
plex numbers (in future version this feature will also be added to the C-part), 
however, the C-integration is much more powerful and should be used if one 
goes for complicated integrals. 

The C-integration is called from Mathematica via the Mathlink protocol. This 
part of the algorithm can perfectly work in a parallel environment: one sim- 
ply has to specify the number of copies of CIntegrate to be launched. The 
Mathematica part distributes the integration tasks between those copies and 
collects the result, preparing the expression of the next order at the same time. 

The C-integration comes as a set of source files and should be compiled first. 
The binary files of CIntegrate for Linux and Windows can be downloaded 
from the web-page of the project 

http://www-ttp.particle.uni-karlsruhe.de/~asmirnov/FIESTAl.htm 

The C-integration is using Vegas [T3lfl4] to perform the integration. It takes 
a string with the expression from Mathematica as input, and performs an 
internal compilation of this string to be able to evaluate the expression fast 
afterwards. No external compiler is required and no files are used for data 
exchange. 

In complicated cases one should use the QLink program [15] to store intermedi- 

1 The paths to the CIntegrate binary, to the QLink[T5] and the database should 
be specified beforehand; see section [5] for details. 
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atedata from Mathematica on disk instead of RAM. QLink can be downloaded 
as a binary or as a source (it comes under the terms of GNU GPLv2) . 



5 Installation instructions 

In order to install FIESTA, the user has to download the installation package 
from the following URL: 

|http: / / www-ttp.particle.uni-karlsruhe.de/^asmirnov/ data/FIESTA.tar.gz 
unpack it and follow the instructions in the file INSTALL. 

The Mathematica part of FIESTA requires almost no installation, one only 
needs to copy the FIESTA_1 . . .m file and edit the default paths QLinkPath, 
CIntegratePath and DataPath in this file, e.g.: 

• QLinkPath=" /home/user/QLink/QLink" ; 

• CIntegratePath="/home/user/FIESTA/CIntegrate"; 

• DataPath="/tmp/user/temp"; 

Here QLinkPath is a path to the executable QLink file, CIntegratePath is 
a path to the executable CIntegrate file, and DataPath is a path to the 
database directory. For the Windows system, these paths should look like 

• QLinkPath= " C : /programs/QLink/QLink . exe '{El; 

• CIntegratePath="C : /programs/FIESTA/CIntegrate . exe"; 

• DataPath="D:/temp"; 

Note, the program will create a big 10 traffic to the directory DataPath, thus 
it is better to put this directory on a fast local disk. 

Alternatively, one can specify all these paths manually after loading the file 
FIESTA_1.0.0.m into Mathematica. 

Please note that the code requires Wolfram Mathematica 6.0 to be installed 
and will not work correctly under lower versions of Mathematica. 

In order to work with nontrivial integrals, the user must install QLink and 
the C-part of FIESTA, the CIntegrate program. The QLink [TJ] can be down- 
loaded as a binary file or compiled from the sources. If the user decides to use 
pre-compiled CIntegrate executable file, he has to place the file to some lo- 
cation and edit the paths in the file FIESTA_1 . 0.O.m as it is described above. 
If the user wants to compile the executable file himself he must have the 
Mathematica Developer Kit installed on his computer. 

2 Mathematica uses normal slashes for paths both in Unix and Windows. 
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Under Unix, the user must edit the file comdef .h to be sure that the macro 
WIN is not defined, i.e., comment the macro definition #def ine WIN 1. Then 
the self-explanatory file Makefile can be used by means of running the "make" 
command (the path to the Mathematica Developer Kit libraries should be 
provided). As a result, the binary file CIntegrate appears. 

For the Windows installation, the user must edit the file comdef . h to ensure 
that the macro WIN is defined, #define WIN 1. The Windows makefile is 
provided in the package; it is named "CIntegrate .mak" and is used with the 
"nmake /f CIntegrate. mak" ; To run this command one has to have the Mi- 
crosoft Visual C++ installed (an express edition of Microsoft Visual C++ can 
be downloaded for free from the Microsoft site). As an addition to the pro- 
grams and packages mentioned above, the user should install the Microsoft 
Platform Software Developer Kit (It can also be downloaded freely after vali- 
dating a genuine copy of Microsoft Windows) and the Intel Fortran compiler 
(an evaluation copy is also available form the Intel site). 



6 Test run description 

To run FIESTA load the FIESTA_1 . . .m into Wolfram Mathematica 6.0. To 
evaluate a Feynman integral one has to use the command 

SDEvaluate [{U,F, 1} , indices , order] , 

where U and F are the functions from formula (Ej), 1 is the number of loops, 
indices is the set of indices and order is the required order of the e-expansion. 

To avoid manual construction of U and F one can use a build-in function UF 
and launch the evaluation as 

SDEvaluate [UF [loop_moment a, propagators , subst] , indices , order] , 

where subst is a set of substitutions for external momenta, masses and other 
values (note, the code performs numerical integrations. Thus the functions U 
and F should not depend on any external values). 

Example: 

SDEvaluate [UF [{k} , {-k 2 ,- (k+pi) 2 , -(k+pi+p 2 ) 2 , -(k+pi+p 2 +p 4 ) 2 } , 
{P? ^0>P2 ->0,p2 ->0, pi p 2 ^-S/2,p 2 p 4 ^-T/2, Pl p 4 ^(S+T)/2, 
S^3,T^ 1}] , {1,1,1, 1},0] 

performs the evaluation of the massless on-shell box diagram (Fig. HD where 
the Maldestam variables are equal to S = 3 and T = 1. A complete log of this 



S 



example follows in Appendix iBl 
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Fig. 1. The massless on-shell box diagram. 
To clear the results from memory use the ClearResults [] command. 

The code has the following options: 

• UsingC: specifies whether the C-integration should be used; default value: 
True; 

• CIntegratePath: path to the CIntegrate binary; 

• NumberOf Links: number of the CIntegrate programs to be launched; de- 
fault value: 1; 

• UsingQLink: specifies whether QLink should be used to store data on disk; 
works only with UsingC=True; default value: True; please refer to Ap- 
pendix [D] on the use of QLink; 

• QLinkPath: path to the QLink binary; 

• DataPath: path to the place where QLink stores the data; for example, 
if DataPath=/temp/temp, then the code three directories: /temp/tempi, 
/temp/temp2 and /temp/temp3; those directories will be erased if existent; 
the directory /temp should exist; 

• PrepareStringsWhilelntegrating: specifies whether the code should pre- 
pare expressions of next order in epsilon at the same time with integrating; 
works only with UsingC=True and UsingQLink=True; should be set to False 
on slow machines; default value: True; 

• IntegrationCut: the actual low boundary of the integration domain (in- 
stead of zero); default value: 0; 

• IfCut and VarExpansionDegree: if IfCut is nonzero then the expression 
is expanded up to order VarExpansionDegree over some of the integration 
variables; the integration function is evaluated exactly if the integration 
variable is greater that IfCut, otherwise the expansion is taken instead; the 
default value of IfCut is zero, the default value of VarExpansionDegree is 

i; 

• ForceMixingOf Sectors: if set to True, forces the code to perform one in- 
tegration only for a given order of e; default value is False; please refer to 
Appendix [Gj 
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• PrimarySectorCoef f icients: The usage of this option allows to take the 
symmetries of the diagram into account. If the diagram has symmetries, 
then the primary sectors corresponding to symmetrical lines result in equal 
contributions to the integration result. Hence it makes sense to speed up 
the calculation by specifying the coefficients before the primary sector in- 
tegrands. For example, if two lines in the diagram are symmetrical, one 
can have a zero coefficient before one of those and 2 before the second. 
PrimarySectorCoef f icients defines those coefficients if set; the size of 
this list should be equal to the number of primary sectors; 

• STRATEGY: defines which sector decomposition strategy is used; STRATEGY_0 
is not exactly a strategy, but an instruction not to perform the sector decom- 
position; STRATEGY_A and STRATEGY_B are the two strategies from Ref. [7J 
guaranteed to terminate; STRATEGY_S (default value) is our strategy, pro- 
ducing better results than the preceding ones; STRATEGY_X is an heuristic 
strategy from [7J ; likely to share the ideas of Binoth and Heinrich [2] : pow- 
erful but not guaranteed to terminate; 

• ResolveNegativeTerms: defines whether an attempt to get rid of negative 
terms in F should be performed; default value: True; see Appendix [Cj 

• VegasSettings: a list of 4 numbers {pi,ri,p2,r 2 }, instructs the code to 
evaluate the integral T\ times with pi sampling points for warm-up and 
T2 times with p2 sampling points for the final integration; default value: 
{10000,5,100000,15}. 

The code sometimes returns Indeterminate as a result. There are the follow- 
ing reasons possible: 

• Negative function F provided; 

This might happen if the propagators in the input for UF are of the form 
— 1/(— p 2 + m 2 — iO) instead of l/(— p 2 + m 2 — iO); 
Solution: change the sign of the propagators; 

• Complex number as an answer; 

If the integrand has complex values, then the C integration cannot be 
used with the current version of FIESTA. If complex numbers are explicitly 
encountered in the integrand, the integration will produce a syntax error. 
Otherwise, if the expressions like log(— 1) are encountered in the integrand, 
it the integration program returns Indeterminate; 

Solution: One can try to use the UsingC=False option, however the built- 
in Mathematica integration cannot work in complicated examples and we 
don't guarantee correct results if complex numbers are used. The support 
for complex numbers will be provided in future versions of FIESTA; 

• Special singularities; 

The standard sector decomposition approach works only for singularities 
for small values of variables. Our code can resolve some extra singulari- 
ties (see Appendix [C]) , but surely all types of singularities could not be be 
covered. 



10 



Solution: One can try different values of the ResolveNegativeTerms op- 
tion. If the singularity is of a special type, one can try contact the authors 
in order to try to make the resolution of singularities of this type automatic. 
• Numerical instability; 

If all the singularities are for small values of integration variables, they 
are resolved in the code. However, the numerical integration may experience 
problems with small values of variables (see Appendix [El). 

Solution: One should use the IntegrationCut or IfCut together with 
VarExpansionDegree options. See Appendix [El for details. 
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Appendix 

A Sector decomposition strategies 

The sector decomposition goal may be reformulated the following way: sup- 
pose we have a polynomial P of n variables and a unitary hypercube U as 
integration area. The task is to separate the integration region into a number 
of sectors so that each of those sectors after a variable replacement turns into 
a unitary cube and P in those variables turns into a product of a monomial M 
and a polynomial of a proper form, i.e. having 1 as one of the summands. This 
requirement is often sufficient for the polynomial P to have no zeros apart 
from the ones due to M (this is valid for a wide class of integrals including 
those with the same sign for all kinematic invariants). 

The sector decomposition is usually performed recursively; for a given polyno- 
mial P one has an algorithm to define how to perform a single sector decom- 
position step. Such an algorithm is also called a sector decomposition strategy. 

All known strategies separate U into parts the following way: a subset / = 
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. . . i m } of {1, ... , n} is chosen and m sectors are defined by 
Si = {(xi, . . . , x n ) e U\x k > x ik Mi k e I}, 

(1 = 1... m). The variable replacement in Si is defined by 
Xi = x • Mi g" I 

x k = X ii 

x h = x 'i, x i k Vi e /, A; ^ Z 

It is easy to verify that the integration region in the new variables x' is again 
a unitary cube. 

We are extending the set of possible regions and variable replacements the 
following way: for any vector (v±, . . . , v n ) in the positive quadrant with at least 
two non-zero coordinates we consider the set / = {i\v,i ^ 0} and separate U 
into m parts by 

Si = {(xi, . . . , x n ) e U\x^ 1 > zJ*Vi fc e I}, 
where . . . i m } = I and the exponents aj are defined by 
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The variable replacement in Si is defined by 
Xi = x'iVi g" / 

X H ~ X ii 

x ik = x'i^x'^ Viel,k=£l. 

It is only a little bit more complicated to verify that the integration region in 
the variables x' is still the unite hypercube. 
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Now let us describe how we choose the vector v. We consider the set of weights 
W of the polynomial P defined as the set of all possible ( Ct\j . . . , CL n ) where 
cx°y ■ ■ ■ is one of the monomials of P. We will say that a weight is higher 
than another one if their difference is a set of non-negative numbers. If P had 
a unique lowest weight, a monomial could be factorized out and we would rep- 
resent P in the required form. Hence it becomes reasonable to try to minimize 
the number of lowest weights of P. We consider the convex hull of W and 
choose one of its facets F visible from the origin. Now v is chosen to be the 
normal vector to F. 

The reason of choosing such a vector is rather simple: the vectors formed by 
the weights of the vectors x' ik for k ^ I are orthogonal to v, therefore are 
lying in the facet F. Hence there is a good chance that after a single sector 
decomposition step only one of the vertices of F is left to be a lowest weight. 

Such a sector decomposition step is guaranteed not to increase the norm de- 
fined in the strategy A of |7] (the Zeillinger strategy [8]). If we can't find a 
facet such that the corresponding step decreases the norm, we fall back to the 
strategy A and perform one step. Hence our strategy is also guaranteed to 
terminate. Practice shows that the strategy A steps are applied in no more 
than in five percent cases. 

The following table shows the number of sectors produced by different strate- 
gies on various examples: 



Diagram 


A 


B 


C 


S 


X 


Box 


12 


12 


12 


12 


12 


Double box 


755 


586 


586 


362 


293 


Triple box 


M 


114256 


114256 


22657 


10155 


D420 


8898 


564 


564 


180 


F 



D420 is a diagram contributing to the 2-loop static quark potential on Fig. IA.1I 
with the following set of propagators: {—A; 2 , —{k — q) 2 , — / 2 , —(I — q) 2 , —(k — 
I) 2 , —vk, — vl}, where k and / are loop momenta, q 2 = —1, qv = and v 2 = 1. 

"F" means that the sector decomposition fails (we suppose an infinite loop). 
"M" means that the memory overflow happened during the sector decompo- 
sition on a 8Gb machine. We suspect D420 to be a counterexample to the 
STRATEGY_X 

Hence we recommend leaving the default option STRATEGY=STRATEGY_S (our 
strategy) or experimenting with STRATEGY=STRATEGY_X (based on the ideas of 
Binoth and Heinrich [2], not guaranteed to terminate). 
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Fig. A.l. A diagram contributing to the 2- loop static quark potential 
B An example 

Here we provide the listing of the Mathematica session for evaluation of the 
massless on-shell box diagram with S = 3 and T = 1 (we split some long 
lines). 

In[l]:= « FIESTA_1.0.0.m 
FIESTA, version 1.0.0 

In [2]:= UsingQLink=False ; 

In [3] : = SDEvaluate [UF [{k} , {-k~2 , - (k+pl) "2 , - (k+pl+p2) "2 , 
- (k+pl+p2+p4) "2> , {pl"2->0 ,p2"2->0 ,p4"2->0 , 
pi p2->-S/2,p2 p4->-T/2,pl p4->(S+T)/2,S->3, 
T->1>] ,{1,1,1,1},0] 

External integration ready! Use CIntegrate to perform calls 

FIESTA, version 1.0.0 

UsingC: True 

NumberOf Links : 1 

UsingQLink: False 

IntegrationCut : 

If Cut: 0. 

Strategy: STRATEGY_S 

Integration has to be performed up to order 

Sector decomposition 0.0138 seconds; 12 sectors. 

Variable substitution 0.0055 seconds. 

Decomposing ep-independent term 0.0025 seconds 

Pole resolution 0.0081 seconds; 40 terms. 

Expression construction 0.0021 seconds. 

Replacing variables 0.0063 seconds. 

Epsilon expansion 0.0123 seconds. 

Expanding 0.0002 seconds. 

Counting variables: 0.0002 seconds. 

Preparing integration string 0.0004 seconds. 

Terms of order -2: 8 (1-fold integrals). 
Numerical integration: 8 parts; 1 links; 
Integrating 0.106322 seconds; returned answer: 
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1.333333 

Integration of order -2: 1.333333 
(1. 333333) /ep~2 

Expanding 0.0005 seconds. 

Counting variables: 0.0005 seconds. 

Preparing integration string 0.001 seconds. 

Terms of order -1: 28 (2-fold integrals). 
Numerical integration: 12 parts; 1 links; 



Integrating 0.409080 seconds; returned answer: 

-2.065743 +- 5.*"-6 
Integration of order -1: -2.065743 +- 5.*~-6 
(1. 333333) /ep~2 + (-0.73241 +- 5.*~-6)/ep 
Expanding 0.0008 seconds. 



Counting variables: 0.0009 seconds. 

Preparing integration string 0.0022 seconds. 

Terms of order 0: 40 (2-fold integrals). 

Numerical integration: 12 parts; 1 links; 

Integrating 0.862786 seconds; returned answer: 

-3.417375 +- 0.000012 
Integration of order 0: -3.417375 +- 0.000012 
(1. 333333) /ep"2 + (-0.73241 +- 5.*~-6)/ep + 

(-4.386496 +- 0.000013) 

Total time used: 1.52991 seconds. 

-6 

1.33333 -0.73241 + 5. 10 pm46 

Out [3]= -4.3865 + + 

2 ep 

ep 

+ 0.000013 pm47 
In [4] := Quit 

The analytical answer for this integral is 

_4_ 2 log(3) 4tt 2 
3e* 3e 9~ 



which is equal to 



1.33333334 - 0.7324081- - 4.3864908. 

e z e 



C Dealing with negative terms in F 



FIESTA has been used to verify master integrals for the three-loop static quark 
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potential [12]. A typical phenomenon when evaluating those integrals is that 
the function F can involve terms like ( 1 — 2xiXj +x]- As a result, 

not all monomials of F have positive signs, resulting in extra singularities for 
the integration. 

To overcome those problems we added an additional feature to FIESTA: an 
attempt to get rid of negative summands in F before performing the sector 
decomposition (those summands may lead to additional singularities that are 
not treated properly by the basic approach). The code tries to find two vari- 
ables, Xi and Xj, split the unitary cube into two regions with Xi < Xj and 
Xi > Xj and do a variable replacement (e.g. for Xi < Xj we make the replace- 
ment i). In case of success, we result in a new function F without 
monomials with negative sign. 

The ideas of this type are not original; it is a common situation when certain 
classes of integrals require a special treatment, in the papers of Binoth and 
Heinrich [2] one can find various examples where special operations preced- 
ing the sector decomposition are performed. Still we consider our automatic 
approach to the problem to be original: one does not require to specify the 
singularities of this type for the code to resolve them. 



D Avoiding memory limits in Mathematica 

Wolfram Mathematica, being a powerful CASGO is widely used among re- 
searchers nowadays. However, a significant drawback of the Mathematica us- 
age is the lack of RAM that is often encountered. FIESTA also had memory 
problems at the ^-expansion stage: an expression for a single sector can be- 
come huge after the expansion, and in complicated examples one can have 
many thousands of sectors. To avoid memory problems we used the QLink 
program to store data on hard disk. 

After performing the sector decomposition we store all the expressions in a 
database via QLink. Afterwards the code never loads the whole set of expres- 
sions into RAM. On the contrary, the code works with sectors one by one: 
it loads an expression from the database, performs an operation with it and 
writes it back to the database. The operation might be one of the following: 
singularity resolution, e-expansion, applying the Expand function or applying 
"if conditions" (see appendix [E]) . 

It is worth noting that the ^-expansion does not use the Mathematica Series 
function but explicit differentiation instead. This has two reasons: first of all, 

3 Computer Algebra System 



1(3 



differentiation with subsequent expansion works faster in Mathematica 6.0 
for the expressions we consider (remember that FIESTA uses Mathematica 
6.0 because of the powerful build-in integration methods that were absent in 
5.2). The second reason is to perform differentiation at the very beginning 
but to wait with applying the Expand function (which takes more time than 
the differentiation!). The latter function can be applied simultaneously with 
the integration at a later stage: we prepare expressions of order o + 1 while 
integrating expressions of order o. 

Such an approach allows to keep the memory usage by Mathematica minimal 
and to leave most of the memory to the CIntegrate copies. 



E Resolution of singularities and its influence on the convergence 
of numerical integration 

As it has been noted in section [21 the integrals 



j dxi . . . dx n C[[ x a / 1 )Z, (E.l) 

where Z has no singularities might have singularities because of non-positive 
Oj. Let us assume that a, < and treat the integrand as a function x ai ~ 1 Y{ai) 
with coefficients being polynomials of other variables. For readability we will 
omit the index i. We replace x a ~ 1 Y(a) by 

yfO)/" 1 + Y'(0)x a + ... + -^y ( -° ) (0)a;- 1 + 

a! 

x a - l (Y(a) - WO) - Y'(0)x - ... - ^-W" a) (0)) 

a! 

The items in the first line of the expression can be analytically integrated 
over x leaving us with one integration less. As for the expression R(a) = 
x a ~ 1 (Y(a)— Y(0)— Y' (0)x— . . . — ^,W _a )(0)), it is known to have no singularities 
at x = 0, however it might result in numerical instabilities at a later stage. 

After performing the resolution of singularities and the e-expansion, one ob- 
tains a set of integrands depending only on integration variables over the unit 
hypercube. The integrands arise from the expressions like R (or even more 
complicated ones in case of several integration variables). 

Those integrands have no singularities, so one could expect that the integration 
is performed with ease. However, one can see that for small x the x a ~ x is huge 
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and the remainder is small. Hence the remainder can't be evaluated exactly 
because it is a difference of not-so-small numbers stored with a finite number 
of digits. 



The first step to the solution is to expand the expressions before evaluating. 
It leaves similar problems (a difference of huge numbers resulting in a normal 
number), but with a smaller numerical instability. Still, in highly degener- 
ate examples the problem persists and prevents the code from evaluating the 
function when the variables turn small. 



FIESTA has two methods to deal with the numerical instability. The first 
method is the usage of the IntegrationCut option. This option sets the low 
integration limit to IntegrationCut (instead of zero). This approach may 
lead to high error estimate, so there is a more advanced method (slower but 
providing better results). 



The idea of the second method is to expand the integration function over the 
integration variables around zero up to a certain order (not all integration vari- 
ables require the expansion, but only the ones with a negative exponent in the 
preceding monomial). Then one can integrate the original function for values 
greater than some fixed number and the expanded one for smaller values. The 
code is instructed to work this way by the If Cut and VarExpansionDegree 
options. Please note that even for IfCut=10 -2 and VarExpansionDegree=l 
one results in an 10 -6 relative error estimate for the remainder that is smaller 
than the integration error estimate for complicated examples. 



There is one more technical problem in the second approach: the usage of 
the Mathematica Series command can be quite slow. Hence to expand the 
function we need to substitute x = into the function and its derivatives. The 
direct substitution is impossible for the same numerical reasons. The solutions 
comes with finding the minimal possible s such that (x s f)(0) can be evaluated 
numerically. Now we can define a new function g = — and use the fact 
that /W(0) = C l +'g®(0) to expand the function. 



Please note that the error estimate in the output is the one that comes from 
the integration; if the cuts are used, one should keep in mind that the real 
error might be greater for the reason that a function different from the original 
one is integrated. 
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F External integration 

F. 1 Overview 

For the integration we use the well-known Vegas algorithm [T3"]m] which is 
based on importance sampling. It samples points from the probability distri- 
bution described by the function |/|, so that the points are concentrated in 
the regions that make the largest contribution to the integral. In the present 
version of FIESTA we use the FORTRAN code linked with the C interface to 
Mathematica. The VEGAS routine requires an integrand as a reference to a 
function of an array of the integration variables x, the number of repetitions 
and the number of sampling points. 

The Mathematica part provides the integrand in a form of a long symbolic 
algebraic expression consisting of the integration variables, powers and loga- 
rithm0, e.g.: 

(l+x[l])*p[x[l]+x[l]*x[2] ,-3]+ l[x[3]]/33 

where p[x[l]+x[l]* x [2] ,-3] stands for (x[l) + x[l] * x[2])~ 3 andl[x[3]] 
for log(x[3]). 

The common approach is based on generating the corresponding C (or FOR- 
TRAN) code, compiling it, linking with some integration routine (VEGAS in 
our case), running the resulting program and reading back the answer. How- 
ever, in our case this approach does not work properly: it is specific to the 
sector decomposition that there can be many thousands of sectors. This leads 
to a big overhead for loading a compiler, a linker, writing the results to a file, 
loading and starting the resulting program and reading the results back. 

Next, the size of the corresponding expressions to be integrated may vary from 
several bytes to hundreds of megabytes. Unfortunately, all of the existing C or 
FORTRAN compilers are strongly nonlinear with respect to the length of the 
input expression. The reason is that these compilers usually use the so-called 
AST (Abstract Syntax Tree, see [16]) which is a tree representation of the 
syntax: each node represents constants or variables (leaves) and operators or 
statements (inner nodes). The complexity of this tree grows rapidly with the 
size of the incoming expression. In practice, it is nearly impossible to compile 
an expression of more then a few megabytes length. 

In order to compile a huge expression one can use some stack-based program- 



4 For the If Cut method (see Appendix |EJ) the if -then-else construction is also 
used, in the form if (condition) (first branch) (second branch). 
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ming language like FORTH [T7] but the performance of programs written on 
these languages is usually not so good as the performance of C or FORTRAN 
programs. 

One of the standard approaches consists in optimizing the algebraic expres- 
sion by means of some CAS tool like the Maple package "CodeGeneration" 
(formerly "codegen" ) [18] and compiling the resulting C (or FORTRAN) code 
without optimization. This cannot solve our problem since the number of ex- 
pressions is too big, anyway, and the resulting (optimized) code sometimes is 
still rather large. Moreover, the time the CAS spends to generate such an op- 
timized code is too large. So we decided to develop an interpreter which is able 
to evaluate a given expression in a "data-driven" manner: a translator trans- 
lates the incoming expression into some internal representational and then an 
interpreter evaluates the expression many times for various input data. It is 
worth noting that no intermediate AST is generated, only the linear sequence 
of triples. This permits our translator to be almost linear w.r.t. the length of 
the incoming expression. 



F. 2 Translation 

The incoming expression consists of (many) terms. Each term is a product 
of a coefficient and a number of integration variables (possibly zero) in some 
powers, some prescribed functions of these variables and subexpressions of the 
same structure (recursively). The translator converts this expression into a se- 
quence of triples, being "an operation" , "first operand" and "second operand" ; 
e.g. the expression 1+x [1] +x [1] *x [3] will be represented as a following se- 
quence of triples: 

1: ' + ', 1, x[l] 
2: '*', x[l], x[3] 
3: ' + >, ~1, ~2 

where "1 is a reference to the result of the first triple evaluation, "2 to the 
second one. A term may involve some functions and subexpressions, e.g. 
x [1] +p [x [1] , -2] * (x [2] +x [3] ) will be translated to 

1: 'p', x[l], -2 

2: ' + ', x[2], x[3] 

3: '*', "1, ~2 

4: ' + ', x[l], "3 



5 We use the triple form, see [T6] . 
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During translation, the subexpression optimization is performed in the fol- 
lowing way: the translator searches for each new triple in the dictionary. If 
the triple has already occurred, the translator does not put the triple into the 
resulting sequence, referencing the old one instead. Note, the translator input 
is already sorted since we get it from Mathematica so at least all coinciding 
monomials will be evaluated only once. 

This simple prefix optimization is surely not perfect. Let us consider the fol- 
lowing expression: 

( 1+x [1] +x [2] *x [4] ) * ( 1+x [2] *x [4] +x [2] *x [3] *x [4] ) 

The terms are sorted but the subexpressions 1+x [2] *x [4] in brackets will not 
be recognized, and x [2] *x [4] in x [2] *x [3] *x [4] also will not be identified. 
Of course, we might have implemented all possible combinations, but the 
resulting time would grow factorially w.r.t. the length of the input string, 
which is unacceptable for us. Hence we implemented the following strategy: 

Let us consider a term. Let us assume it has an implicit coefficient 1 in 
front. We consider every factor as a "commuting diad" of a form "opera- 
tion", "operand", e.g. 3*x [1] *x [3] /x [2] will be considered as a set of the 
following commuting diads: 

, 3 

, x[l] 
'*' , x[3] 
'/', x[2] 

Now we can re-order all these diads without affecting the result. We try to 
evaluate such a sequence in some "native" order, e.g. in the above example 
this would be like the following: 

, x[l] 
'/', x[2] 
, x[3] 
, 3 

We put the coefficient to the end, so terms differing by a coefficient will 
be evaluated only once. It is also worth noting that in each subexpression 
Mathematica already sums up all similar terms. 

The same idea works out (even better) for summands, the only difference is 
the implicit initial value instead of 1. 

This works well, reducing the number of generated triples by an order of 
magnitude. The translation procedure is in the worst case quadratic w.r.t. to 
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the length of the input string, in average it is a bit worse than linear. Typically, 
in a 3GHz Xeon processor the translator spends about 30 seconds to translate 
a gigabyte expression, which is incomparably less than the integration time. 

F.3 Evaluation 

The sequence of the translated triples is completely platform independent, 
and it is rather straightforward to generate (directly to the memory) the bi- 
nary executable code for any existing architecture, and execute it. For the 
moment, we've restricted ourselves by developing the interpreter suitable for 
every platform. 

First of all, the triple sequence we have from the translation stage, is trans- 
lated into the array of quadrples: "operation" , "operand 1" , "operand 2" and 
"result" . The fields "operand 1" and "operand 2" are pointers to some memory 
addresses. The "result" field is either the result of the quadruple evaluation 
(double type in C), or the address of the memory cell in which the result 
must be stored. The point is that the size of the address field on 64 bit plat- 
forms is the same as the size of double, and the usage of indirect references 
is rather slow, so it seems to be reasonable to use a new memory cell for each 
intermediate result. 

However, on a modern CPU such an approach appears to be completely wrong. 
Modern processors have a big write-back cache memory (megabytes), and 
provided the same memory address is permanently updated it is never written 
to RAM but stays in the cache. On the other hand, if we write intermediate 
results every time to a new address, all this data is sooner or later written 
to RAM (after the cache exhausted). That is why for "small" jobs, when all 
intermediate results fit into the processor cache, the interpreter stores results 
directly to the "result" field of a quadruple, while for "large-scale" jobs the 
translator creates some buffers and provides the quadruples with re-usable 
addresses of elements of these buffers. 

The algorithm switches from the "small" to "large" model when the number 
of generated triples reaches some threshold which strongly depends on the size 
of the processor L2 (L3) cache per active core and it is a subject for tuning 
for every specific architecture. The corresponding value is hard-coded, it can 
be changed in the file scanner. h, the macro 
INDIRECT_ADDRESSING_THRESHOLD. 

The complete structure of the C-part is as follows. The routine AddStringO 
builds the integrand step by step collecting the incoming lines. Every moment 
the process may be canceled by invoking the function ClearStringO . After 
(almost) all of the lines are collected, the routine Integrate () adds the rest 
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and invokes the translator. After the incoming expression is translated into the 
quadruple array the routine Integrate () invokes the VEGAS routine for 5 
repetitions with 10000 sampling points in order to "warmup" the grid. Finally, 
the VEGAS routine is invoked by the Integrate () routine for 15 repetitions 
with 100000 sampling points for the real integration^!. 



G Integration of sums vs sums of integrations 



The current version of FIESTA integrates in each sector separately (each sector, 
not just each primary sector). The results are summed up and the error esti- 
mate is calculated with a mean-square norm. This approach might encounter 
a natural suspicion: the errors add up and might result in an absurd result. 
However practice shows that the real error of this method is normally only 
about 20 percent greater that the error of the other approach — integrating 
the sum of expressions. 

The reason lies within the adaptable integration algorithms: if each of the 
terms is integrated individually, the algorithm can adapt to all the peaks of 
the given term. Hence the error estimate for the individual terms might be 
good enough. However when one tries to evaluate the whole sum at once, the 
peaks and other peculiarities of the different terms collide and the integration 
algorithm cannot adapt to all of them properly. 

Another reason to integrate in each sector individually is obviously the econ- 
omy of RAM. And even more, this approach suits well for parallel environ- 
ments due to much better load balancing. 

For the CIntegrate interpreter short (but not so-short) expressions are also 
preferable. 

At first sight, long expressions could be optimized much better than the short 
ones for the following reason: the longer the expression, the more common 
subexpressions are encountered, and all the subexpressions should be evalu- 
ated only once. But for a long expression the program must use the indirect 
referencing for quadruple results (see Appendix IF.3j) which can slowdown the 
interpreter considerably. 

In general, all these effects speedup the evaluation up to an order of magnitude 
when each individual sector is integrated independently on others. 

6 This is the default behavior; the exact number of sampling points can be set by 
the VegasSettings option, see Section [6l 
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